Adding Query Privacy to Robust DHTs 

Michael Backes^'^ Ian Goldberg'^ Aniket Kate^ Tomas Toft"* 

^Max Planck Institute for Software Systems (MPI-SWS), Germany 
^Saarland University, Germany 
•^University of Waterloo, Canada 
^\i '*Aarhus University, Denmark 



< 



u 



s 



Abstract 



o 

Mh Interest in anonymous communication over distributed hash tables (DHTs) has increased in recent 

years. However, almost all known solutions solely aim at achieving sender or requestor anonymity 
^f\ in DHT queries. In many application scenarios, it is crucial that the queried key remains secret from 

intermediate peers that (help to) route the queries towards their destinations. In this paper, we satisfy this 
requirement by presenting an approach for providing privacy for the keys in DHT queries. 

We use the concept of oblivious transfer (OT) in communication over DHTs to preserve query privacy 

without compromising spam resistance. Although our OT-based approach can work over any DHT, we 

c/3 concentrate on communication over robust DHTs that can tolerate Byzantine faults and resist spam. We 

I ^1 choose the best-known robust DHT construction, and employ an efficient OT protocol well-suited for 

achieving our goal of obtaining query privacy over robust DHTs. Finally, we compare the performance 

04 of our privacy-preserving protocols with their more privacy-invasive counterparts. We observe that there 

»^ is no increase in the message complexity and only a small overhead in the computational complexity. 

f — . A'e>'worfifi.Distributed hash tables. Query privacy. Spam resistance. Oblivious transfer 
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1 Introduction 



^~~' In the digital society, our online activities are persistently recorded, aggregated, and analyzed. Although 

. . worldwide electronic data privacy laws and organizations such as EFF [1] and EPIC [2| try to challenge 

. !^ this pervasive surveillance through policies and protests, privacy enhancing technologies (PETs) are key 

^ components for establishing a suitable privacy protection mechanism from the technology side. The in- 

terest in developing novel PETs is increasing for a variety of reasons, ranging from the desire to share and 
access copyrighted information without revealing one's network identity, to scalable anonymous web brows- 



ing p9] - [31|[35|pO| . In this paper, we study privacy in the peer-to-peer (P2P) paradigm, a popular approach 
to providing large-scale decentralized services. 

In the P2P paradigm, distributed hash tables (DHTs) fBS 41] 47 54 1 are the most common methodology 



for implementing structured routing. Similar to hash tables, a DHT is a data structure that efficiently maps 
keys onto values that are stored over a distributed overlay network. However, unlike hash tables, DHTs can 
scale to extremely large number of key-value pairs as the mapping from keys to values is distributed among 
all peers. In order to obtain a value associated with a key, a requester (a sender) in a DHT routes the key 
through a small fraction of the network to reach the receiver that has stored the value. DHTs can also handle 
continual arrivals and departures of peers, and small-scale modifications to the set of peers do not disturb 
the mapping from keys to values significantly. 

In a DHT, privacy may be expected for the sender, the receiver or the queried key. Ensuring the 
anonymity of senders and requesters in DHTs has received considerable attention in the privacy commu- 

1 



nity [29-31 35 50 1. Privacy of the queries /keys, i.e., keeping the keys secret from intermediate peers that 
route the queries towards their destinations, is also equally important: in many scenarios such as censor- 
ship resistance, this query privacy constitutes a necessary condition for sender and requestor anonymity. In 
this paper, we present a practical approach to obtain privacy for queries in robust and spam-resistant DHTs 
where a fraction of peers may behave maliciously. 

1.1 Contributions 

Almost all anonymity solutions for DHTs try to provide anonymity to a sender or a requester in a DHT 
lookup, upload or request. It may also be necessary that the queried key remains secret from peers that 
route the corresponding requests in some situations. We call this property query privacy. In this case, an 
intermediate routing peer should be able to suggest a next peer or a set of next peers without determining 
the key being searched for. Example application scenarios for this property can be protection against mass 
surveillance or censorship, preventing tracking and data-mining activities on users requests, and providing 
opportunities to access material that is socially deplored, embarrassing or problematic in society. 

Recursive routing and iterative routing are the two approaches to route information in DHTs. In the 
recursive routing approach, obtaining query privacy looks infeasible, if not impossible. This results from 
the fact that the intermediate router itself decides to which peer to forward a request. Assuming that every 
peer is under the control of an individual, it is always possible for the controller to figure out the next peer 
for the request. On the other hand, query privacy in iterative routing, which is also a commonly used routing 
approach over robust DHTs, can be trivially obtained if every peer sends its complete routing table to the 
requesting peer. The requester can then determine the next peer itself and send the request. However, this 
solution may make it significantly easier to mount spamming attacks in the systems: a malicious sending 
peer can easily gather a significant amount of routing information, and use it to determine and target peers 
that hold specific keys. 

We instead use the oblivious transfer (OT) primitive | T7p7 |. Given a peer holding a database, OT allows 



a requester with a key to obtain a database entry associated with the key, such that the requester does not 
get any information about other database entries and the database owner does not learn the requester's key. 
Therefore, OT perfectly fits our requirements of obtaining query privacy without divulging additional routing 
information. We use the OT protocol by Naor and Pinkas |32| in the best-known robust DHT constructions 
by Young et al. 1 52[ 53j (their RCP-I and RCP-II protocols) to obtain our goal of query privacy in robust 



DHTs. Importantly, our query privacy mechanism does not increase the message complexity of the RCP-I 
and RCP-II protocols and an increase in the computation cost is also not significant. We elaborate on our 
exact choice of OT protocol in Section [43t and discuss robust DHT constructions in Section|2] 

The employed OT protocol {32\ is a simple indexed OT protocol, where the database contains index- 
value pairs and a requester inputs an index. However, a query for a routing table entry is an interval mem- 
bership (or a range) query and not an index query. Therefore, to prevent a requesting peer from obtaining 
any additional information, we could have employed the concept of conditional OT (COT) 1 15] and used the 
interval-membership strong COT (I-SCOT) protocol by Blake and Kolesnikov I^SJ. However, the I-SCOT 
protocol is expensive in terms of both computation and communication. We observe that by releasing the 
upper and lower bounds of a routing table entirely to the requester, large improvements can be made. There 
is no information in these range boundary values for a malicious requesters in terms of spamming as they 
do not convey any information regarding the identities of the key owners. However, given this information, 
the requester will know the desired entry (index), allowing the use of OT instead of the more complex COT. 



Private information retrieval (PIR) 1 13 1 is a weaker form of OT, where more information may be revealed 
than asked for; e.g., sending the complete routing table is a trivial PIR protocol. PIR protocols can be less 
costly than OT in terms of computation, but the risk of spamming persists with all non-OT PIR protocols, 
and hence we avoid them. 



Outline The rest of the paper is organized as follows. In Section |2j we survey the literature on robust 
DHTs. Section |3]describes our system model, while Section|4]overviews the cryptographic tools used in our 
constructions. In Section |5] we present the robust communication protocols that preserve query privacy. In 
Section |6] we analyze and discuss performance and systems issues. Finally, we conclude in Section |7] We 
include a detailed description of the employed OT protocol in Appendix [A] 

2 Background and Related Work 

Malicious behaviour is now common over the Internet. Lack of admission control mechanisms in DHT 
systems make them particularly vulnerable to these malicious (or Byzantine) attacks |45}49|. Such attacks 



can not only pollute the data that is available over DHTs f25l, but also poison the indices by creating fake 
data identifiers ||26||. They may further create Sybil identities and disrupt communication between well- 
behaving peers by spamming. The concern is quite serious, since large-scale P2P systems in existence 



today (e.g., Azureus or Vuze DHT [18] or KAD DHT |46|) see millions of users every day. Along with 
the basic file sharing application, there are proposals for using P2P systems to re-implement the Domain 
Name System |48|, mitigate the impact of computer worms [3| and protect archived data [20j[51j|. These 
applications would benefit from tolerance against Byzantine behaviors. As a result, a number of solutions 



have been defined that can provably tolerate Byzantine faults over P2P systems (e.g., |p]47 19 22 23 33 42 



52 1). Due to the popularity of DHTs, the majority of these solutions are built to work over DHTs and the 
resulting constructions are called robust DHTs. 

In robust DHTs, malicious attacks are generally dealt with using the concept of quorums ||4]-[7j 19j 22] 



33]|42). A quorum is a set of peers such that a minority of the members suffer adversarial faults. Typically, 
it consists of 0(log n) nodes where n is the total number of nodes in the underlying DHT. A DHT quorum 
replaces an individual DHT peer as the atomic unit and malicious behaviour by an adversary is overcome by 
majority action; e.g., the content may be stored in a distributed and redundant fashion across members of a 
quorum such that it cannot be polluted by a small fraction of host peers. Poisoning attacks can be mitigated 
by having peers belonging to the same quorum validate routing information before it is advertised. If a peer 
violates the protocol, then it is possible to remove it from the quorum, which effectively removes them from 
the system. 

Protocols using quorums are split between those that use iterative and those that use recursive ap- 
proaches. When sending a request using the recursive approach, a sending peer contributes one message 
(its request), while its DHT has to generate 0(log n) messages. In the iterative approach, a sending peer has 
to contribute an equal number of message as its DHT. Consequently, while dealing with Byzantine faults, 
the iterative approach is more common than the recursive approach as the former provides better protection 
against the spamming attack than the latter. 

The common way such quorums are utilized is as follows: a request m originating from a peer p traverses 
a sequence of quorums Qi, Q2, . . . , Q^ until a destination peer is reached. A typical example is a query 
for content where the destination is a peer q holding a data item. Initially p notifies its own quorum Qi that 
it wishes to transmit m. Each peer in Qi forwards m to all peers in Q2. Every peer in Q2 determines the 
correct message by majority filtering on all incoming messages and, in turn, forwards it to all peers in the 
next quorum. This forwarding process continues until the quorum Q^ holding q is reached. Assuming a 
majority of correct peers in each quorum, transmission of m is guaranteed. 

Unfortunately, this simple protocol is costly. If all quorums have size rj and the path length is £, then 
the message complexity is £7]'^. Typically, for a DHT of n nodes, rj = 0(logn) and, as in Chord [47], 
i = O(logn), which gives 0(log^ n) messages; this is likely too costly for practical values of n. Saia and 



Young 1 42 1 mitigate this problem using a randomized protocol which provably achieves 0(log n) messages 



in expectation; however, the constants in their protocols are prohibitively large. 





Peers u, v, and w are in the routing table (TZT) of peer p on a DHT. Correspondingly, in an quorum topology with 
p E Qi, u e Q2, w € Q3 and w € Q^, quorums Q2, Q3, and Q^ are linked to quorum Qi in its TZT- Thick lines 
signify inter-quorum links. Each quorum has size 77 = ri(logn) and must have strictly fewer than 1/3 faulty peers. 

Figure 1 : Quorum Topology over DHTs 



Recently, Young et al. f52] demonstrated that the problem can be solved using threshold cryptogra- 
phy fl4|. Using a distributed key generation (DKG) protocol over the Internet [24] and a threshold digital 
signature scheme |[9|, they design two robust communication protocols, RCP-I and RCP-II, that respectively 
require O(log^n) messages and O(logn) messages in expectation. Importantly, these protocols can tol- 
erate adversarial peers up to any number less than 1/3 of a quorum in the asynchronous communication 
setting and less than 1/2 of a quorum in the synchronous communication setting. They also do not require 
any trusted party or costly global updating of public/private keys outside of each quorum. The protocols 
work in the elliptic curve cryptography (ECC) based discrete logarithm setting, and its security is based 
the gap Diffie-Hellman (GDH) assumption [11]. The paper also includes results from microbenchmarks 
conducted over PlanetLab showing that these protocols are practical for deployment under significant levels 
of churn and adversarial behaviour. We find this work to be the most up-to-date solution for robust and 
spamming-resistant communication in DHTs and use it as a starting point towards query privacy. 



Privacy in communication over DHTs has also been under consideration over the last few years |29 - 31 



34J35J5OJ. However, most of these PETs concentrate on sender (or requester) privacy, and generally aim 
at a scalable anonymous web browsing system: a future replacement for Tor [16]. Our aim in this paper is 
different; we want to achieve privacy for keys in DHT queries (or query privacy). Nevertheless, we observe 
that our query privacy mechanism can further enhance anonymity in almost all of the above PETs. Our 
approach is also significantly better in terms of message complexity than redundant routing f34l, where a 
requester makes multiple queries to confuse an observer. 



3 System Model and Assumptions 

In this section, we discuss the quorum-based DHT system model, and the adversary and communication 
assumptions that we make in our protocols. As we develop our anonymity solution on top of robust com- 



munication protocols by Young et al. [52 1, our model is nearly the same as their model. 

For ease of exposition, we do not consider the link failures and crash-recovery mechanism in that work, 
which in turn follows from the underlying DKG architecture |24| . However, our protocols indeed work even 
under these assumptions without any modification. 



3.1 Adversary and Communication Assumptions 

We work in the asynchronous communication model (unbounded message delays) with Byzantine faults. 
However, to ensure the liveness of the protocols, we need the weak synchrony communication assumption 
by Castro and Liskov [T2'|, which states that the message delay does not grow longer indefinitely. Note that 
this assumption arrives from the underlying robust communication protocols and is unrelated to our privacy 
preserving mechanism. 

In a P2P system, each peer is assumed to have a unique name or identifier p and an IP address p^a,. Peers 
p and q can communicate directly if each has the other in its routing table (TZT). 

Similar to the majority of anonymous communication networks |T6}29j 30j 35 1, we do not assume a 



global adversary that can control the whole network and break anonymity by observing all communication 
by every peer. Such an adversary seems impractical in large-scale geographically distributed DHT deploy- 
ments. However, we assume that our partial adversary knows the network topology and controls a small 
fraction of the DHT peers. Following prior works [ ,39„40„43„52„53J , we consider around 10% of all peers 
to be under adversarial control. The adversary cannot observe communication at the majority of nodes; 
however, it may try to break query privacy, spam honest nodes, or disrupt the communication by actively 
attacking traffic that reaches peers under its control. 

We assume that the 10% adversarially controlled nodes are spread out evenly over the DHT, and strictly 
less than 1/3 of the peers in any quorum are faulty which is the best possible resiliency in the asynchronous 
setting. This bound on the adversary is possible using mechanisms like the cuckoo-rule developed by Awer- 
buch and Scheideler f61, which restricts the adversary from acquiring many peers in the same quorum. Fur- 
ther, all faulty peers in a quorum may be under the control of a single adversary, and collude and coordinate 
their attacks on privacy, safety and liveness. 

Finally, our adversary is computationally bounded with security parameter k. We assume that it is 
infeasible for the adversary to solve the GDH problem 1 1 1"] in an appropriate setting for signatures and the 
decisional Diffie-Hellman (DDH) problem | |10J in another setting for OT. 

3.2 Quorums 

In a variety of approaches used to maintain quorums, one may view the setup of quorums as a graph where 
peers correspond to quorums and edges correspond to communication capability between quorums. This is 
referred to as the quorum topology in the literature. Figure [T] shows how quorums can be linked in a DHT 
such as Chord ||47l. 



We assume the following four standard invariants | p2| are true for the quorum topology under consider- 
ation: 

Goodness. Each quorum has size r/ = il(log n) and must have strictly fewer than 1/3 faulty peers. 

Membership. Every peer belongs to at least one quorum. 

Intra-Quorum Communication. Every peer can communicate directly to all other members of its quo- 
rums. 

Inter-Quorum Communication, if Qj and Qj share an edge in the quorum topology, then p G Qj may 
communicate directly with any member of Qj and vice versa. 

To the best of our knowledge, no practical implementation of a quorum topology yet exists. However, as 
indicated in the Uterature ||5]-^ T9[ 33 1, maintaining the above four invariants looks plausible in real-world 



DHTs. 

In a DHT where the above four invariants are maintained, the general communication mechanism in 
Young et al. | ,52J works as shown in Figure [2] Assume that a peer p wants to send a query m associated 



Request 




A peer p sequentially communicates with Qi, Q2, and so on, until it reaches Qi who owns the searched-for key. 
Figure 2: Iterative Communication in Robust DHTs using Quorums 



with a key that belongs to quorum Q^, which it does not know. The recipients of the request are generally 
a set of peers D C Q^. Peer p requests authorization from peers in its quorum Qi. These authorizations 



are based on a rule set |[19 1 that defines acceptable behavior in a quorum (e.g., the number of data lookup 
operations a peer may execute during a predefined time period). This rule set is known to every peer within 
a quorum and is possibly the same across all quorums; it reduces the impact of spamming attacks. Peer p 
receives Proof(Qi) in the form of a signature if authorized. It then sends this to quorum Q2 from its routing 
table, which is responsible for the key being searched for. One or more members of Q2 verify the signature 
and provides p routing information and a Proof(Q2) for Q3, which will convince Q3 that p's actions are 
legitimate (i.e., approved by its quorum). The protocol continues until p reach Q^. 

As mentioned in Section |2j it possible to achieve robust communication without using any of the above 
cryptography. However, use of cryptography provides efficiency and reduce the message complexity by at 
least a linear factor. Note that we do not discuss membership update operations for quorums in this paper as 
they remain exactly the same as those in previous work |[52||53| . 



4 Cryptographic Tools 

Here, we describe the cryptographic tools that we use in our solution. In particular, we review distributed 
key generation, threshold signature and oblivious transfer protocols. 



4.1 Threshold Signatures 

The use of distributed key generation (DKG) and threshold signatures in our privacy preserving schemes 
comes from the underlying robust DHT architecture. In this architecture, threshold signatures are used 
to authenticate the communication between quorums. In an (77, t)-threshoId signature scheme, a signing 
(private) key sk is distributed among rj peers either by a trusted dealer (using verifiable secret sharing) 
or in a dealerless fashion (using DKG). Along with private key shares ski for each party, the distribution 
algorithm also generates a verification (public) key PK and the associated public key shares PK. To sign 
a message m, any subset of i + 1 or more peers use their shares to generate the signature shares a-i. Any 
party can combine these signature shares to form a message-signature pair S = (m, a) = [m\sf^ that can be 
verified using the public key PK. 

In this work, we refer to a message-signature pair 5 as a signature. Further, it is possible to verify the 
individual signature shares a-i using the public key shares PK. We assume that no computationally bounded 
adversary that corrupts up to t peers can forge a signature S' = {m^a') for a message m'. Malicious 
behaviour by up to t peers cannot prevent generation of a signature. 

Among three known practical threshold signature schemes ||9]|2T}|44|, Young et al. employed the thresh- 
old version (9) of the Boneh-Lynn-Shacham (BLS) signature scheme fTTl for their robust DHT design. They 
reason that, unlike Shoup's construction [44], the key generation in threshold BLS signature scheme does 
not mandate a trusted dealer, and unlike Gennaro et al.'s construction |21 1, the signing protocol in threshold 
BLS signature scheme does not require any interaction among peers or any zero-knowledge proofs. They 
also mention efficiency of the BLS signature scheme in terms of size and generation algorithm as compared 
to the other options and employ it to authenticate the communication between the quorums. 

4.2 Distributed Key Generation — DKG 

As a trusted party is not feasible in the P2P paradigm, the underlying robust DHT architecture also needs a 
complete distributed setup in the form of DKG to generate distributed signing keys. An {i], t)-DKG protocol 
allows a set of r] nodes to construct a shared secret key sk such that its shares ski are distributed across the 
nodes and no coalition of fewer than t nodes may reconstruct the secret. In the discrete logarithm setting, 
there is also an associated public key PK and a set of public key shares PK in DKG for verification as 
required for threshold signatures. 

For the robust DHT architecture. Young et al. use a DKG protocol p4] | defined for use over the Internet. 
We continue to use threshold BLS signatures over this DKG setup in our privacy preserving enhancement. 

4.3 Oblivious Transfer— OT 

The first notion of oblivious transfers was introduced in 1981 by Rabin 1371. A l-out-of-2 oblivious transfer 
(OT) | |l7| allows a choosei[^p to decide between two messages held by a serveij^g. Moreover, OT protocols 
also guarantee that the server learns nothing, while the chooser obtains at most one of the messages. The 
concept may be generalized to 1-out-of-i^ OT, where q holds u messages from which p may pick only one. 
In this work we will use this to obtain the relevant entry of the routing table from a quorum; the use of 
oblivious transfers ensures that the query remains secret, while at the same time spamming is prevented, 
since a malicious p is guaranteed to receive only a single entry. 

We utilize an OT protocol by Naor and Pinkas | |32[ Protocol 3.1] as it fulfills all our needs; see Ap- 
pendix |A] for an overview. The protocol provides 1-out-of-i/ string OT, as we require. It is round optimal 

'This is sometimes denoted "receiver" in the OT literature; we use the term "chooser" to avoid confusion with the overall 
receiver of message m in the surrounding DHT protocol. 

^This is typically denoted "sender" in the OT literature; we use the term "server" to avoid confusion with the overall sender of 
message m in the surrounding DHT protocol. 



and requires only one message per party (OT-request from p and OT-response from q), except an OT-setup 
message that we may piggyback in the surrounding protocol. Moreover, it requires no zero-knowledge 
proofs, and also works in the elliptic curve cryptography (ECC) setting. The computation complexity of the 
protocol is dominated by the number of exponentiations; both server and chooser must on average perform 
two of these. In addition to the low computational costs, the overall communication amounts to roughly "iv 
group elements. 

The construction of Naor and Pinkas allows transfer of group elements; i.e., strings of approximately 
256 bits in the ECC setting. This is not sufficient for an entire entry of a routing table. Rather than increasing 
the group size or performing multiple OTs, we simply let a peer q symmetrically (AES) encrypt each entry 
of the routing table using a random key. The encrypted table is then sent to peer p who uses an OT execution 
to obtain the AES key for the relevant entry from peer q. 



For protocol RCPqp-l in Section 5.2 we will require a chooser p to run an OT with multiple members of 
the same quorum. We could reduce p's computation by ensuring that all parallel OT instances are verbatim 
copies here. This would naturally require that the all servers use the same source of randomness for OT-setup 
and for AES keys. This can be achieved easily using a parameterized pseudorandom function (PRE): 0(r, •). 
The private key r required for can easily agreed upon as part of a DKG execution, as it should be known 
to all quorum members. When the quorum executes an OT instance with chooser p, it may use p's message 
itself as an input to PRE <j). This PRE-based modification does not have any effect on the OT security proof 
as all parallel OT instances are verbatim copies. 

Other Possibilities A natural question to ask is whether OT is really required, or whether another protocol 
could achieve the desired goal more efficiently. Although PIR protocols appear to be an alternative, they are 
not an acceptable alternative because they leak routing information. Eurther, computational PIR protocols 
have similar cost as the selected OT protocol 132]. Eor that matter, most non- trivial PIR is essentially OT as 
well. 

Theoretically better OT protocols also exist, e.g. Lipmaa's OT protocol p7| , which provides 1-out-of-i^ 
OT with 0(log^ v) multiplicative overhead on communication (of a single entry). Eor the proposed protocol 
by Naor and Pinkas, the overhead is linear which, in theory, is clearly worse. However, our approach is 
better in the present setting when we consider numbers from real-world DHTs. With more than million 
peers in a practical DHT, we will have u ^ 20. Eor u ^ 20, log^ z^ ss 20 and linear communication without 
any hidden constant is quite acceptable. A generic l-out-of-2 OT protocol of Peikert et al. ['361 requires only 
two messages, and roughly five exponentiations per party. However, this is still more than the amortized 
cost of the 1-out-of-z^ OT of Naor and Pinkas and we do not use it. While we cannot rule out the possibility 
of a more efficient protocol, it seems highly unlikely. 

Einally, hiding the range values in routing table entries seems possible, but it is most likely infeasible in 
practice. Blake and Kolesnikov ||8j| provides a l-out-of-2 conditional OT (COT) based on the greater-than 
relation. Their protocol has a blowup of a factor linear in the bitlength of the key. This blowup is needed 
in order to compute the greater-than relation. In addition to this, there are two critical issues that must 
be solved before COT can be used for hiding the range values: 1) the present work 18] requires a 1-out- 
of-z/ conditional oblivious transfer 2) the protocols of [8] are only secure against semi-honest adversaries. 
Neither seems impossible to solve, but both appear to incur a significant blowup. Nevertheless, as no routing 
information is lost through range boundaries, we need not consider these. 

5 Adding Query Privacy 

Young et al. ||52| present two robust communication protocols using quorums and threshold cryptography: 



RCP-I and RCP-II. As described in Section 3.2 both these protocols work in the general communication 



architecture shown in Figure [2] They use threshold BLS signatures over the DKG architecture explained in 
sections [4~T] and [421 In this section, we provide query privacy to the above protocols using the OT primitive 



explained in Section 4.3 to define protocols RCPqp-l and RCPqp-l 



5.1 System Setup 

We start our discussion by describing the setup required for our protocols. For clarity of description, we also 
briefly review routing tables (TZT) in quorum-based DHTs. 

Initiation. Before the system becomes functional, the initiator has to choose appropriate groups and other 
setup parameters for the BLS signature and OT protocols. Note that there are no trust assumptions 
required during this step, as these parameters can be selected from the well-known standards. 

Distributed Key Generation. A DKG instance is executed, when a quorum gets formed in DHTs. At 
the end of an execution, each quorum Qj is associated with a (distributed) public/private key pair 
[PKq., skQ.). Note that only those quorums linked to Qj, and not everyone in the network, need 
to know PKq.. Further, every peer p G Qj possesses a private key share (s/cqJp of skQ.. Unlike 
the quorum public/private key pair of Qj which must be known to all quorums to which Qj is linked 
in the quorum topology, only the members of Qj need to know the corresponding public key shares 
PKq.. The private key r of PRF 0(r, •) required in RCPqp-l can easily be generated during this 
DKG execution. 

Routing Table Setup. Without loss of generality, we assume a Chord-like DHT [47]. When a quorum gets 
formed in DHTs, it determines its neighbors and forms its routing table TZT. For a quorum Qj, each 
entry of its routing table has the form TZTq = [Qj,p,p', PKq , ts]. In this entry, peer p € Qj and 
peer p' G Qj-i where quorum Qj links to quorum Qj and Qj-i in the quorum topology and p and p' 
are respectively located clockwise of all other peers in Qj and Qj-i. PKq is the quorum public key 
of Qj generated using DKG, and ts is a time stamp for when this entry was created. Quorum Qj is 
responsible for the identifier space between identities p and p'. TZT entries of Qj are set such that the 
complete identifier space is covered by them. 



5.2 Adding Query Privacy to RCP-I: RCPqp-l 

Protocol RCP-I works deterministically. Here, we include a privacy preserving mechanism for queries 



in RCP-I using the OT protocol described in Section 4.3 The enhanced protocol (RCPqp-l) appears in 
Figure [3] which we outline as follows. 

Assume that p G Qi is searching for a key and the target is a set of peers D C Q^. Let the search path go 
through quorums Qi , . . . , Q^. Peer p begins by sending a request [p, p^^^,, tsi] to all peers in its quorum Qi ., 
where tsi is a time stamp. Unlike the original RCP-I, the key corresponding to the intended destination 
of the message is not included here. Each honest peer q £ Qi checks if p's request follows the rule-set as 



described in Section 3.2 If there is no violation, q sends its signature share to p, who interpolates those 
shares to generate a signature Si = [p|paddi|isi]sfcQ ■ In each intermediate step {i = 2to i — 1), p sends its 
most recent signature 5j_i and a new time stamp tSi to each peer q € Qj. Since Qj is linked to Qj-i in the 
quorum topology, each peer q knows public key PKq. ^ to verify 5j-i. If 5j-i is verified and tsi is valid, 
peer q sends back its signature share on [plPaddrl^Sj]. Peer p collects the shares to form Si and majority filters 
on the routing information for Qj+i. If verification of Si fails, peer p sends all shares back to every party in 
Qj, who help p by filtering the invalid shares out. Finally, for Q^ , p sends m along with S^-i to peers in the 
target set Z? in Q^. 



Initial Step: p G Qi with Quorum Qi 



peer p 



every peer g G Qi 



sends a request [p|paddr|isi] =^ 

if the request is legitimate, reply with a sig- 
nature share 
Intermediate Steps: p G Qi with Quorum Qj for i = 2 to £ — 1 
peer p every peer g G Qj 



\tSi^l]skQ. , us- 



interpolate cSj-i = [pIPaddrh-j-us/CQ^ 

ing the received shares and send 5j_i and 

a new tsi. Request an OT initiation 



interpolate Si = [plp^aMsilskQ using the 
received shares and verify it using PKq.. 
If invalid, sends all signature shares back. 
Send an OT-request for the index corre- 
sponding to the searched key 



Use the received OT-responses, if any, to 
determine the next quorum Oj+i 



verify Si-i using PKq^^ and validates 
tSi. If successful, send a signature share, 
an OT-setup message, the ranges in IZT of 
Qi and the entry-wise encrypted IZT of Qi 



verify all shares using PKq^ and inform p 
of valid shares. Send an OT-response 



Final Step: p G Qi with Quorum Q^ 



peerp 

send 5^-1 along with its request m 



DCQ. 



Figure 3: RCPqp-l: RCP-I with Query Privacy 

It still remains to see how Qj tells p the correct Qj+i as the next quorum without knowing the key 
being searched for. We accomplish this using the OT protocol. Along with Si-i and tsi, p also sends an 
OT-initiation request to every peer in g' G Qj. Peer q responds back with the entry-wise symmetrically 
encrypted (AES) routing table TZTq^, the OT-setup message, and the upper and lower bounds of ranges 
in TZTq- ■ Note that since all quorum members use the same randomness (due to the use of a PRF where 
everyone holds the private key), the messages from all honest parties will be the same. Peer p determines 
an index in ^lTQ^ for the next quorum by searching for key in the received ranges and sends an OT-request 
for that index. Peer q then computes and sends the OT-response. Using this response, peer p obtains the 
symmetric key corresponding to the queried index and decrypts the appropriate entry in TITq- to determine 
the next quorum Qj+i. Any wrongdoing by Byzantine peers in TITq- range tables, encrypted TZTq^ blocks 
and OT executions are taken care of by the majority action. As p knows Q2 using its own routing table, 
there is no OT involved in the initial step. 

Notice that it is also possible for peer p to use OT in the final step while communicating with the target 
set D in Q^, if the privacy application demands it. In that case, the target set D only knows that the queried 
key is one of its keys, but cannot determine the exact key. 
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Initial Step: p G Qi with Quorum Qi 



peerp 

sends a request [plpaddrl^s] 

verify and interpolate received shares to 
form Ml = [p\p,Mr\tsi]skQ^ 



every peer g G Qi 



if the request is legitimate, reply with a sig- 
nature share 



Intermediate Steps: p G Qi with Quorum Qj for i = 2 to £ — 1 
peer p selected peer qi G Qj 



select peer q G Qi uniformly at random 
without replacement. Send Mj_i and re- 
quest an OT initiation 



sends an OT-request for the index corre- 
sponding to the searched key 

If PKq^,^, TZTq^^-^ (computed from the 
OT-response) and PKq._-^ verifies, com- 
pute Mi = [Mi-i\[PKQ^_^]skQj and 
determine the next quorum Qj+i from 
TZTq.^-^. Otherwise or if there is a time- 
out, choose q'j^ Gr Qj and repeat 



For J = i — 1 downto 2, verify PKq-^ us- 
ing PKq- and verify Mi using PKq^. If 
successful, send [PKq^ J^/tQ , the ranges 
in IZT of Qi and the entry-wise encrypted 
(signed) IZT of Qi 



send an OT-response back 



Final Step: p G Qi with Quorum Q^ 



peerp 

send M£_i along with its request m 



DCQf, 



Figure 4: RCPqp-ll: RCP-II with Query Privacy 

The correctness of protocol RCPqp-l follows directly from that of protocol RCP-I and we refer the 
readers to [53] for a detailed proof. Although the encrypted routing tables (a few kilobytes in size) are sent 
in our privacy -preserving approach as compared to the individual routing table entires in RCPqp-l, it does 
not affect the message complexity of the protocol. The message complexity of protocol RCPqp-l remains 
exactly the same as protocol RCP-I, which is equal to 0(log^ n). We discuss the increase in computational 
cost and other systems matters in Section [6] 

5.3 Adding Query Privacy to RCP-II: RCPqp-ll 

Protocol RCP-II utilizes signed routing table (TZT) information and reduces the message complexity in pro- 
tocol RCP-I by a linear factor (in expectation) using a uniformly random selection of peers in the quorums. 
Here, all TZT entries are signed separately by the quorum whenever TZTs are modified. In particular, every 
peer in the quorum, using their DKG private key shares, generates and sends signature shares, which are 
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then interpolated to obtain signed TZT entries. Tiie OT setup and the OT protocol remain exactly the same 
as in RCPqp-l. The enhanced protocol (RCPqp-ll) appears in Figure|4| which we outline as follows. 

Initially, for simplicity, assume that peers act correctly. The initial step, where p communicates within 
its quorum, Qi, remains exactly the same. Each peer in Qi receives [p|Pi,ddi |is] from p. If the request does 
not violate the rule set, then peer p receives signature shares and computes Afi = [p^p-Mn ts\skQ ■ Next, p 
knows the membership of Q2 which belongs to its TZT, and selects a peer 52 G Q2 uniformly at random 
without replacement. Peer p sends Mi to (72- The correct 52 verifies Mi using PKq^, and replies with 
[PKQ^]skQ and [R-To-JskQ ■ Here, [PKQ-]gk^, denotes the quorum public key of Qj signed by quorum 
Qi as neighboring quorums know each others' public keys, and [R-TaJskQ. denotes the routing entry for 
Qj signed by Q^. Peer p verifies [PKQ^]skQ and [TlTQ^jskQ . and checks if the time stamp is valid. If 
so, p constructs M2 = [Mi\[PKQ-^]skQ ]■ The idea is to allow some peer in Q3 to verify PKq^ and Mi 
using a signature chain. Further, p can check the response from some peer in Q3 in the next step using 
PKq^ included in TZTq.^ . This process repeats with minor changes for the remaining steps until p reaches 
the destination quorum Q^. If any peer does not respond in the amount of time predefined by the weak 
synchrony assumption [12] (as described in Section [3?T] ) or responds incorrectly, the protocol proceeds by 
choosing uniformly at random another peer in the quorum. Note that any attempt by a malicious peer to 
return incorrect information is detectable. 

It still remains to see how the OT executions for key are performed such that a correct peer qi in Qj can 
give routing information for Qj+i to peer p. For this, peer p sends an OT initiation to peer qi along with 
Mj_i. Upon verification of the signature chain, qi replies with [PKQ._^]skQ., ranges in TZT of Qj, entry- 
wise encrypted (signed) TZT of Qj, and the OT-setup message. Note that these encryptions are done locally 
at peers, and applied on both the TZT entires and signatures. Peer p then determines an index corresponding 
to the key it is searching for and sends an OT-request for that index. Peer qi then computes and sends an 
OT-response. Using this response, peer p obtains the symmetric key corresponding to the queried index and 
decrypts the appropriate entry in TZTq. , checks the signature on the resulting plaintext, and thus determines 
the next quorum Qj+i. 

Similar to RCPqp-l, if required, it is possible for peer p to use OT in the final step while communicating 
with the target set Z) in Q^. The correctness of the protocol follows directly from that of the original RCP-II 



protocol, and we refer the readers to |53 1 for a detailed proof. The message complexity of the enhanced 
protocol remains exactly the same as the original protocol, which is equal to O(logn) in expectation. We 
discuss the increase in computational cost and other systems matters in Section [6] 

6 Analysis and Discussion 

As discussed in Section |5] our protocols do not increase the message complexity of their original counter- 
parts RCP-I and RCP-II. In this section, we consider the increase in computation due to the query -privacy 
mechanism and find it to be nominal. We also analyze possible system-level attacks on our protocols. 

6.1 Additions to Computational Costs 

Query privacy does not come without some additional computation. However, for our choice of OT, this 
increase is insignificant as compared to the computations already done in the original RCP-I and RCP-II 
protocols. 

In both the RCPqp-l and RCPqp-ll protocols, a requesting peer p has to perform only two additional 
exponentiations at each privacy-preserving TZT entry retrieval, while a responding peer qi in quorum Qj 
must perform one additional exponentiation. Peers in Qj also have to perform u exponentiations for an 
OT setup, where u is the size of TZT. However, they can be batch-computed and may also be reused in 
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V requests. In terms of computation, our privacy-preserving mechanism remains exactly the same in both 
protocols, RCPqp-l and RCPqp-ll. This results from a peer -p running the same instance of OT with all 



peers in the quorum in RCPqp-l with the help of the PRF-based technique discussed in Section 4.3 



Timing values computed using the pairing-based cryptography (PBC) library |28| indicate that one ex- 
ponentiation takes around 1 ms on a desktop machine. Given that the communication time for the original 
RCP-I and RCP-II protocols is greater than 3 seconds (refer to Young et al. fsl] for a detailed discussion), 
the cost of these exponentiations is insignificant. In terms of system load, a DKG execution in RCP-I and 
RCP-II on average requires 2 CPU seconds, and a threshold signature generation and verification takes about 
6 CPU ms. Therefore, our OT executions do not increase the system load by any significant fraction. Note 
that the OT protocol also involves a few group multiplications, PRF executions, symmetric encryptions and 
hashes. Their computations take only a few ixs, so we ignore these computational costs in our discussion. 

6.2 System-level Attacks on Query Privacy 

Although OT hides the queried key completely in the cryptographic sense, there can be system-level attacks 
that leak some information about the key. 



A range estimation attack defined by Wang, Mittal and Borisov |50| that reduces privacy provided in 
NISAN [35] could be applied to our RCPqp-l protocol. This attack is based on the fact that the Chord-like 
DHT ring is directed and the requesting peer -p will not query a quorum succeeding the queried key except 
in the first iteration. Therefore, an adversary that can observe the peer p contacting a sequence of quorums 
can put them together into a sequence to narrow down on the target range that peer p may reach. In this 
attack, the range only extends from the last contacted quorum having an adversarial peer to the largest jump 
possible at the end of first iteration. For NISAN, Wang et al. show that if at least 20% of nodes are under 
the adversary's control, the adversary may obtain a significant amount of information about the queried key. 
As indicated in Section |3.1[ we consider the percentage of peers under the control of a single adversary to 
be around 10%. Therefore, although this range estimation attack is possible, it is not particularly effective 
in our DHT setting. On the other hand, the curious peers in the intermediate quorums only see requests 
approved by one of their neighbors. This, along with the security provided by the OT protocol, ensures that 
nothing is revealed about the queried key to the curious intermediate quorums. 

As only an expected constant number of peers are contacted per intermediate quorum in our RCPqp-ll 
protocol, the range estimation attack by Wang et al. [50J is far less effective. However, query privacy for 
our RCPqp-ll protocol is slightly weaker in terms of the above mentioned curious observer attack. This is 
a direct consequence of the use of a signature chain to authorize a request from a peer p: assume a peer qi 
from an intermediate quorum Qj. Although q^ may not be able to determine quorums from the public keys 
in a chain, the length of the chain itself might give peer qi some information about possible key values. This 
results from a property of Chord-like DHTs: generally each step brings a requester exponentially closer to 
its destination. As an example, a shorter signature chain indicates that a destination quorum is probably 
situated away from Qj in the key (or identifier) space, while a length nearly equal to log n indicates that the 
destination quorum is probably nearby. This is, however, a weak heuristic attack as path lengths of DHT 
requests may vary significantly. Further, it is possible to mislead such a curious adversary by adding a few 
fake signatures at the end of the chain. The requesting peer p has to have this done by its quorum Qi . 

6.3 Crawling Attacks towards Spam Prevention 

As discussed in Section |2j usage of the iterative routing approach significantly improves robustness against 
spamming attacks, since a spamming peer has to perform an equal amount of work as the rest of the system. 
Young et al. |52| add further protections against spamming in RCP-I and RCP-II by not allowing the 
adversarial peer to gather a large amount of routing information. They add the queried keys to requests. 
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As a result, an execution of RCP-I or RCP-ll leads to the requester p gaining information only about the £ 
quorums in its path. We concentrate on query privacy in this work and enforce that the queried key should 
remain completely oblivious to every intermediate quorum Qj for i G [1, ^ — 1]. This may lead to attacks, 
where the adversary peer p obtains more routing information; we call these attacks crawling attacks. 

In our RCPqp-l protocol, a malicious peer p may try to obtain the entire TZT of Qj by querying for 
different keys (or TZT indices) to different peers in Qj. As a result, the adversary peer p can acquire more 
information than allowed by the rule set. It is possible to thwart this supposed attack completely by adding 
one communication round: here, p also has to get its OT-request message (which is the same for all peers 
in Qj) signed from Qj in the exact same way as its authorization request [p|paddi|isj]. This ensures that p 
can query the quorum for only one key (specifically, one index in TZT), and query privacy of the key also 
remains unaffected. This additional one round does not change the message complexity of the protocol. We 
do not include this defense mechanism in the protocol described in Figure [3j as repercussions of this attack, 
if any, may vary from system to system. 

In our RCPqp-ll protocol, similar crawling is possible. The adversary peer p may query different 
peers in quorum Qj for different indices to obtain the complete TZT for Qj. However, unlike in RCPqp-l, 
a malicious peer p has to increase its effort linearly to obtain the complete TZT of Qj in RCPqp-ll and 
crawling is not an effective attack for the maUcious peer p. 

In both protocols, it is possible for a malicious peer p to alter the queried key while shifting from one 
quorum to the next, as there is no link between signed authorizations and the queried keys for privacy 
reasons. This may, however, lead to a peer p gaining more knowledge as it can continuously modify its key 
to traverse as much of the DHT as possible. This is an even weaker crawling attack than the one mentioned 
above, as the adversary has to perform a significant amount of work to gain any information. 

Notice that any information gained by the adversary in the above active attacks is still substantially 
smaller than information effortlessly available to it when PIR or trivial PIR are used. Finally, it may be pos- 
sible to stop the adversary p from gaining any additional information without revealing its key using com- 
putationally and communicationally demanding cryptographic primitives such as zero-knowledge proofs or 
conditional OT [15]. However, we find that their inclusions are not essential, and may be even impractical, 
for DHT-based systems. 

7 Conclusion 

In this paper, we have introduced the concept of query privacy in the robust DHT architecture. We have 
enhanced two existing robust communication protocols (RCP-I and RCP-ll) over DHTs to preserve the 
privacy of keys in DHT queries using an OT protocol. We reviewed the OT literature and chose a theo- 
retically non-optimal but practically efficient (in terms of use over DHTs in practice) OT scheme. Using 
this, we built two protocols (RCPqp-l and RCPqp-ll), which obtain query privacy without any significant 
increase in computation costs and message complexity in practice. Our privacy-preserving mechanism does 
not change the underlying protocols' utility or efficacy in any way, and is also be applicable to other DHT 
communication architectures. 
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A The Oblivious Transfer Protocol 



In this appendix, we provide an overview of the 1-out-of-z^ OT protocol of Naor and Pinkas |32|. Security 
of the construction is based on DDH in a group G of prime order | G| . The proof of security uses the random 
oracle model, i.e. the protocol uses a cryptographic hash function, H, which is then replaced by the random 
oracle in the proof. Recall that the goal is for the server, q, to offer v strings. Si, ... , Sy, and for chooser 



p to obtain the desired one, Sp, and nothing else. The basic idea of the 1 32 1 OT protocol is to let p provide 
encryption keys PKi for 1 < i < u; these are constructed such that p can know at most one of the decryption 
keys. The server then supplies p with encryptions of each Si under PKi. Details are included in a protocol 
flow below in Figure [5] we now elaborate on the intuition behind the three messages: 
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Setup (for u invocations) 



peerp 



peer 



Pick r G Z|c| uniformly at random and 
compute a = g^'; iox 1 < i < V pick Ci 
uniformly at random in G and compute Cf . 
Send a and C2, . . . , C,y to p. 



Online (single invocation) 



peer p requesting Sp 
Pick k G Z|fj| uniformly at random and 
compute PKp = g^. \f p ^ 1 compute 
PKi = Cp/PKp. Send PKi to q. 



peer q holding Si,...,Su 



Compute PKI; then for 1 < i < z^ com- 
pute PKI = Cl/PKl. Pick a random 
string R and for 1 < i < z^ compute an 
encryption of Si, H{PKl, R, i) ® Si\ send 
all V encryptions to p along with R. 



Compute first PK^p 



a and then 



H(PKp, R, p); use this to decrypt the pth 
encryption and output the plaintext, Sp. 



Figure 5: The 1-out-of-z/ OT protocol of Naor and Pinkas 

1. OT-setup: q picks a random DL instance, a = g^ and sends this to p. Moreover, the parties agree on 
u — 1 random group elements, C2, ■ ■ ■ , C^. It is crucial that p does not know their DL, hence they are 
picked by q (who is allowed but not required to know the DL). 

2. OT-request: p will supplies PKi to q;fov 1 < i < u PKi is implicitly set to Ci/PKi. p constructs 
PKi such that PKp has known DL; however, if p could find the DL of any other key, PKi, p could 
solve a DDH problem in G. 

3. OT-response: q computes PKI = C^ /PK[ for all i. Since the C[ may be precomputed this requires 
only a single exponentiation and v — 1 multiplications, q then picks a uniformly random i-bit string, 
R, where i is chosen large enough (e.g. 200 bits) to ensure that R will be distinct. Finally, for each 
I < i < u q computes an encryption of Si as Ei = H{PKl, R, i) © Si. R and the Ei are sent to p, 
who computes first PK^p = a^ and then decrypts to obtain Sp = Ep® H{PKp, R, p). 

For more details along with the proof in the random oracle model, see [ |32J . As noted, a and the C[ may 
be preprocessed during periods of low computational load. Moreover, the values may be used in multiple 
instances of the OT protocol. "Refreshing" r (i.e. a and the C[) every v execution provides an amortized 
complexity of two exponentiations per party per OT invocation. The setup message consists of v group 
elements, while the OT-request contains only a single one. Finally, the reply consists of v EiS plus R; 
though strictly speaking, these are not group elements, they may be viewed as such for the complexity 
analysis. Hence, overall communications is 2z/ + 2 group elements. 

We remark that since we are transferring AES keys using OT, we could also directly use (some digest 
of) PKl,R, i as the AES key. However, such ad hoc optimizations may easily introduce subtle flaws. The 
security proof of Naor and Pinkas may easily be invalidated by even a minor optimization, hence, as the 
gains are marginal, we prefer the original OT protocol to any ad hoc optimization. 
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